Introducing the Per-Fide Project: Parallelizing Portuguese with six different Languages

نویسندگان

  • Sílvia Araújo
  • Ana Correia
  • Ana Oliveira
  • Alberto Simões
چکیده

In this paper we present the Per-Fide project, aimed at the construction of parallel corpora mapping the Portuguese language to six other languages English, Russian, French, Italian, German and Spanish in various domains including literary, journalistic and religious texts. First we will focus on the corpus design criteria and its main features, particularly those that distinguish this corpus from existing parallel corpora. Secondly, we will discuss the challenges of elaborating a typology of text-types for the religious domain and problems associated with the encoding of the texts belonging to this category. To conclude, we will demonstrate how the Per-Fide Corpus can be used in contrastive and translation studies with a case study of pronominal causative constructions in a French-Portuguese contrastive perspective.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Per-Fide Corpus : A new Resource for Corpus-Based Terminology, Contrastive Linguistics and Translation Studies

The Per-Fide project is a joint collaboration between researchers at the Department of Informatics and the Institute of Arts and Humanities at the University of Minho, Portugal. The acronym Per-Fide stands for Portuguese (P) in parallel with 6 languages: English (E), Russian (R), French (F), Italian (I), German/Deutsch (D) and Spanish/ Español (E). First, we expound on the role of the Per-Fide ...

متن کامل

Language independent and language adaptive large vocabulary speech recognition

This paper describes the design of a multilingual speech recognizer using an LVCSR dictation database which has been collected under the project GlobalPhone. This project at the University of Karlsruhe investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, Croatian, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Tu...

متن کامل

The Presence and Influence of English in the Portuguese Financial Media

As the lingua franca of the 21st century, English has become the main language for intercultural communication for those wanting to embrace globalization. In Portugal, it is the second language of most public and private domains influencing its culture and discourses. Language contact situations transform languages by the incorporations they make from other languages and Portugal has...

متن کامل

Multilingual and Crosslingual Speech Recognition

This paper describes the design of a multilingual speech recognizer using an LVCSR dictation database which has been collected under the project GlobalPhone. This project at the University of Karlsruhe investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, Croatian, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Tu...

متن کامل

SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task

This paper presents the description of 12 systems submitted to the WMT16 IT-task, covering six different languages, namely Basque, Bulgarian, Dutch, Czech, Portuguese and Spanish. All these systems were developed under the scope of the QTLeap project, presenting a common strategy. For each language two different systems were submitted, namely a phrasebased MT system built using Moses, and a sys...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010